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Abstract 

This  technical  report  summarizes  the  intrinsic  sensor  parameters  for  the  range,  IR  and 
color  sensor  used  in  the  Fort  Carson  data  collection  and  explores  sensor-to-sensor  image 
mapping  under  different  assumptions  regarding  relative  sensor  placement.  The  default 
case  is  to  assume  perfectly  boresight  aligned  placement,  and  then  the  implications  of 
different  deviations  from  this  perfect  placement  are  considered.  Included  in  this  report 
is  a  description  of  the  calibration  process  used  to  recover  the  color  sensor  parameters. 
A  key  result  shown  in  detail  is  the  relative  equivalence  of  planar  sensor  translation  and 
small  angle  pan  and  tilt  for  points  of  known  depth.  This  simplifying  approximation  has 
significant  implications  when  fusing  data  from  separate  range  and  optical  sensors. 


*This  work  was  sponsored  by  the  Defense  Advanced  Research  Projects  Agency  (DARPA)  Image  Understanding  Program 
under  grants  DAAH04-93-G-422  and  DAAH04-95- 1*0447,  monitored  by  the  U.  S.  Army  Research  Office 
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1  Introduction 


In  mapping  data  from  an  optical  sensor  onto  a  range  sensor,  such  as  a  LADAR,  intrinsic  sensor  calibration 
as  well  as  extrinsic  sensor  pose  must  be  taken  into  account.  This  paper  looks  at  various  aspects  of  the 
relative  geometric  transformations  between  sensors  which  are  nearly,  but  not  completely,  boresight  aligned. 
The  practical  motivation  for  this  work  is  to  explore  the  how  these  mapping  change  given  different  types 
of  deviation  from  the  perfect  case  and  from  this  exploration  draw  some  conclusions  about  when  certain 
simplifying  assumptions  can  be  used  without  introducing  excessive  error  into  the  sensor-to-sensor  pixel 
mapping. 

The  intention  is  to  present  this  material  at  a  detailed,  nearly  tutorial  level.  All  but  the  section  on 
calibration  should  be  readily  understood  by  any  reader  who  possesses  basic  knowledge  of  trigonometry 
and  linear  algebra.  Consequently,  there  is  a  risk  the  presentation  may  appear  a  bit  labored  to  the  reader 
intimately  familiar  with  these  issues.  Those  with  such  backgrounds  either  do  not  need  this  background  or 
should  read  quickly. 

Several  highly  practical  considerations  have  brought  this  paper  into  existence.  First,  much  of  the  work 
at  Colorado  State  on  multi-sensor  fusion  [SB94,  BHP95,  J.  96,  Ant96]  has  been  implicitly  using  assumptions 
fleshed  out  and  tested  in  this  paper.  The  foremost  such  assumption  is  that  changes  in  pixel  mappings 
between  sensor  induced  by  small  rotations  of  one  sensor  relative  to  another  may,  under  a  limited  set  of 
conditions,  be  expressed  as  planar  translation  of  one  image  relative  to  another.  To  many  familiar  with  the 
geometry  involved  the  conclusion  is  self  evident.  However,  it  is  important  to  understand  just  how  good 
an  approximation  this  is  and  thus  what  magnitude  of  error  to  expect  under  different  practical  operating 
conditions.  One  way  to  view  this  technical  report  is  as  a  long  tutorial  working  up  to  Section  6  which  answers 
this  question. 

Another  motivation  for  this  technical  report  is  to  better  record  and  understand  the  characteristics  of  the 
sensors  used  in  the  Fort  Carson  data  collection  [BPY94].  In  November  of  1993  Colorado  State  University,  Al- 
liant  Techsystems  and  Martin  Marrietta  jointly  collected  a  set  of  range,  IR  and  color  data  at  the  Colorado  Na¬ 
tional  Guard  Facility  at  Fort  Carson,  Colorado.  Over  400  range  images  were  collected  in  such  a  manner  as  to 
approximate  3  boresighted  sensors.  This  technical  report  contains  estimates  of  the  intrinsic  sensor  parameters 
for  the  Fort  Carson  data  and  these  sensors  are  used  for  illustration  throughout  the  report.  This  data  is  now 
publicly  available  and  may  be  down-loaded  from  our  web  site:  http://www.cs.colostate.edu/~vision/. 
Anyone  using  this  data  may  find  this  report  helpful. 

2  Overview 

Section  3  reviews  and  defines  the  intrinsic  parameters  of  a  perspective  or  pin  hole  camera  model.  It  also 
presents  two  ways  of  deriving  these  intrinsic  parameters.  The  first  and  obviously  superior  way  is  through 
calibration  and  Section  3.2  presents  details  on  the  exact  calibration  technique  used  to  recover  the  intrinsic 
parameters  for  the  color  data  collected  at  Fort  Carson.  The  second  way  is  to  compute  them  from  information 
commonly  provided  by  a  manufacturer,  and  this  is  reviewed  in  Section  3.3. 

Section  4  lays  to  rest  a  key  detail  relating  to  some  range  sensors  including  the  one  used  in  the  Fort  Carson 
data  collection.  Based  upon  the  physics  of  the  actual  range  sensor,  we  have  been  told  that  a  spherical  mapping 
is  a  more  accurate  description  of  the  image,  i.e.  the  pixels  spacing  uniform  in  angle.  In  this  section  the 
spherical  mapping  is  compared  to  the  most  closely  equivalent  pin  hole  camera  model  and  it  is  concluded  the 
difference  in  pixel  mappings  for  common  points  in  the  world  never  exceeds  0.15  pixel  units  for  the  LADAR 
used  at  Fort  Carson.  Based  upon  this  analysis,  we  conclude  that  the  pin-hole  model  is  perfectly  acceptable 
for  this  sensor  and  that  the  distinction  does  not  matter  for  the  field  of  view  and  pixel  resolution  in  question. 

Section  5  takes  up  the  key  question  of  how  do  different  deviations  from  perfect  bore-sight  alignment  alter 
the  mapping  between  sensor  image  planes.  If  one  sensor  rotates  about  the  about  the  optical  axis  relative  to 
the  other,  the  mapping  between  sensors  remains  2D  affine  for  all  points  in  the  world.  When  the  horizontal 
and  vertical  scale  factors  are  identical  no  warping  is  involved  and  the  rotation  angle  is  preserved  in  the  2D 
mapping.  If  one  sensor  translates  forward  or  backward  relative  to  the  other,  there  is  no  single  2D  affine 
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mapping  for  all  3D  points.  There  is  a  mapping  involving  scaling  and  translation  in  the  image  plane  where 
the  translation  is  independent  of  depth  and  the  scaling  is  dependent  upon  depth.  If  one  sensor  translates 
relative  to  another  in  a  common  image  plane,  again  the  2D  mapping  is  not  independent  of  the  depth  of 
points  in  the  world.  In  this  case,  there  is  scale  change  which  is  independent  of  depth  and  the  translation 
term  depends  upon  depth  to  the  points.  Finally,  the  case  of  one  sensor  rotating  about  the  horizontal  and 
vertical  axes  relative  to  another  is  considered.  Under  these  conditions,  the  2D  mapping  between  images 
becomes  quite  complex  and  dependent  upon  the  full  3D  coordinates  of  the  point  being  viewed.  Unlike  the 
previous  Ccises,  it  is  not  longer  helpful  to  attempt  to  derive  a  2D  mapping  between  images. 

Understanding  that  minor  shifts  in  pan  and  tilt  angles  of  one  sensor  relative  to  another  introduce  a 
quite  complicated  mapping  between  image  spaces,  it  becomes  interesting  to  ask  under  what  conditions  such 
rotations  may  be  reasonably  approximated  by  the  much  simpler  case  of  translation  in  a  common  image  plane. 
Section  6  works  up  an  analysis  which  allows  us  to  answer  this  question.  In  this  analysis,  two  sensors  are 
coupled  so  as  to  track  a  common  reference  point  at  a  fixed  depth.  To  accomplish  this,  one  rotates  and  the 
other  translates.  As  is  perhaps  not  surprising,  the  translation  approximates  the  rotation  almost  perfectly 
for  points  at  the  tracking  depth.  When  working  within  a  narrow  depth  of  field  about  the  tracking  depth, 
the  approximation  is  likewise  very  good  and  actual  values  are  presented  in  Section  6.  Finally,  for  all  depths 
beyond  the  tracking  depth,  the  approximation  introduces  error.  However,  this  error  grows  quickly  and  then 
begins  to  approach  an  upper  bound.  Thus,  for  points  beyond  the  tracking  point,  the  pixel-to-pixel  error  for 
practical  purposes  is  bounded. 

3  Optical  Sensor  Geometry 

Let  US  review  the  basics  of  3D  projection  as  performed  with  a  projective  camera.  The  key  mapping  is  between 
3D  points  and  their  projection  on  the  2D  image  plane.  Many  texts  treat  this  topic  [FD82] .  One  of  the  most 
compact  and  simplest  ways  of  expressing  the  3D  to  2D  relationship  closely  follows  concepts  developed  in 
projective  geometry.  The  following  is  a  general  equation  for  projection. 

I  =  PW 
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ty 
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0 
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where  J  is  a  point  on  the  projective  image  plane,  P  is  the  projection  matrix  and  W  is  the  3D  point  being 
imaged. 

There  is  a  marvelous  trick  implicit  in  this  technique  which  makes  the  non-linear  perspective  mapping 
amenable  to  a  simple  linear  algebraic  form.  This  trick  is  actually  quite  proper  and  rigorous  in  terms  of 
projective  geometry  and  is  nicely  explained  in  [Fau93].  From  a  mechanical  standpoint,  simply  observe  that 
expanding  out  the  matrix  multiplication  yields: 


1  = 


-f*  tuZ 
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The  2D  point  I  is  represented  in  projective  coordinates  in  which  there  are  an  infinite  number  of  ways  to 
express  a  single  point. 

P  =  aP  Va  (3) 


Given  this  redundancy,  a  normalized  form  is  selected  in  which  the  third  element  must  equal  1. 
this  normalization  to  I  yields 
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The  Y  ^  may  now  be  recognized  as  the  ratios  which  are  commonly  used  to  define  or  explain  perspective 
projection. 

The  four  terms  s„,  tu  and  define  the  intrinsic  characteristics  of  the  optical  camera.  The  terms 
and  Sv  are  typically  called  the  scale  factors  and  they  encode  both  the  focal  length  of  the  sensor  as  well  as 
the  pixel  sampling  dimension.  It  is  equally  correct  to  interpret  these  parameters  as  the  size  of  the  horizontal 
and  vertical  focal  lengths  measured  in  pixel  units  [Fau93].  The  terms  tu  and  4  represent  the  coordinates  of 
the  point  where  the  optical  axis  pierces  the  image  plane  as  measured  in  pixel  units.  The  issue  of  recovering 
these  intrinsic  parameters  from  known  calibration  targets  is  taken  up  in  in  the  following  section. 

3.1  Calibration  from  Calibration  Targets 

The  calibration  work  for  the  color  imagery  was  performed  by  Zhongfei  Zhang  at  the  University  of  Mas¬ 
sachusetts  using  a  method  developed  earlier  at  UMass  by  Yong-Qing  Cheng  et  al  [CCHR94].  The  resulting 
intrinsic  parameters  are  presented  in  Table  1.  There  is  a  minor  inconsistency  between  the  imagery  used 
for  calibration  and  that  typically  distributed  with  the  Fort  Carson  dataset:  the  latter  has  been  cropped  to 
the  center  720x480  pixels.  This  does  not  alter  the  scale  factors,  but  does  alter  the  image  center  by  half  the 
cropping  margin.  This  adjustment  is  reflected  in  the  last  row  of  Table  1. 


Image 

Scale  ! 

i^actor 

Principal  Point 

Resolution 

Su 

i'll 

ty 

767  X  512 

978.081 

947.117 

368.536 

243.794 

720  X  480 

55 

345.036 

227.794 

Table  1:  Scale  factors  and  image  center  for  Fort  Carson  Imagery 


3.2  Details  of  The  Calibration  Method 

This  section  describes  briefly  how  these  parameters  were  estimated.  Sensor  calibration  is  a  rather  complicated 
topic  and  this  section  will  not  attempt  the  same  level  of  tutorial  presentation  used  elsewhere  in  this  report. 
Readers  unfamiliar  with  calibration  are  encouraged  to  see  [Gan84,  LT86,  STH80]. 

As  laid  out  above,  the  camera  model  used  in  this  work  is  assumed  to  be  pinhole  and  the  underlying 
mathematical  model  is  a  perspective  transformation.  Consider  the  case  with  m  camera  positions  and  n 
3D  points  Pi{xi,yi,zi),  ...  ,  Pni^n^Vn,  Zn)  in  the  world  coordinate  system.  For  jth  position,  there  are  n 

corresponding  image  points  ...,  Qn\ul^\vn^)  (j  =  1,2,  ...  ,  m). 

Assuming  that  the  relationship  between  the  world  coordinate  system  and  the  camera  coordinate  system 
at  camera  position  j  is: 


=  R^iPi  +  P  (5) 

where  =  ixPyvii,  is  the  3D  coordinate  vector  of  the  fth  point  at  jth  camera  coordinate  system, 
Rj  =  is  the  rotation  matrix  from  the  world  coordinate  system  to  the  jth  camera  coordinate 

system,  and  P  =  is  the  translation  vector  from  the  world  coordinate  system  to  the  jth  camera 

coordinate  system.  Given  the  above  assumptions,  together  with  equation  1,  lead  to  the  following  set  of 
constraint  equations: 

,-5  ( j)  _ 

-(j)  '■  ’ 
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where  and  are  image  projections  of  the  corresponding  3D  points.  The  camera  internal  parameters 
are  Su,Sv,tu,tv,  and  the  external  parameters  are  Thus,  taking  the  difference  between 

projected  and  actual  observed  image  points,  we  form  the  following  general  aggregate  sum-of- squares  of  the 
residuals  over  m  images: 


m  n 


j=l  i=l 


where  C7u  and  are  deviations  along  the  two  image  axes,  and  they  are  set  to  1  in  all  the  tests. 
The  function  depends  nonlinearly  on  Pi,  and  the  4  +  6  x  m  parameters  in  a  where 

X  -  a  f  c  q  *(1)  #(1)  7^"*) 


(7) 

(8) 


Note  that  although  has  nine  parameters  in  total,  they  only  count  as  three  independent  param¬ 

eters,  since  there  are  six  constraints  imposed  on  the  nine  parameters  in  order  to  form  a  rotation  matrix.  See 
Appendix  A  of  [Kum92]  for  the  details  of  those  constraints. 

The  function  ^rn  inay  be  expressed  as  the  second-order  Taylor  series  expansion 

=  ^m(5)  +  +  — Aa^iif  Aa  (9) 

where  a  is  the  initial  guess  and  Aa  is  an  small  correction  to  a,  w  ^  is  the  gradient  of  the  objective 
function  with  respect  to  a,  and  H  =  is  the  second  derivative  matrix  (Hessian  matrix)  of  the 

objective  function 

Here,  the  Levenberg-Marquardt  algorithm  which  is  a  robust  algorithm  to  solve  nonlinear  systems  devel¬ 
oped  by  Levenberg  and  Marquardt  [Mar63,  PFTV88],  is  used  to  compute  the  camera  parameters  a.  In  the 
Levenberg-Marquardt  method,  we  have 


Aa=-{H  +  Xiy'^LJ  (10) 

where  A  is  a  conditioning  factor  and  I  is  an  identity  matrix. 

Due  to  tracking  in  the  image,  without  using  special  patterns,  some  2D-3D  measurements  and  correspon¬ 
dences  may  be  incorrect.  In  these  cases,  the  underlying  noise  in  the  2D  and  3D  data  may  not  be  Gaussian. 
Hence,  gross  errors  or  outliers  may  occur.  In  order  to  deal  with  gross  errors  or  outliers  in  the  2D  and  3D 
data,  the  following  least  median  of  squares  (LMS)  estimator  is  used.  It  has  been  proved  that  the  following 
minimization  always  leads  to  a  solution  [RL87] 


Minimize  =  medianj 


(11) 


Since  the  median  is  not  differentiable,  must  be  minimized  using  combinatorial  methods  such  as 
subsampling.  The  algorithm  based  on  least  median  of  squares  technique  is  proposed  as  follows: 


(a)  Select  “Z”  random  subsets  of  size  “fc”  from  the  input  data. 

(b)  For  each  subsample  Si,  determine  the  camera  parameters  a  by  using  the  Levenberg-Marquardt  al¬ 
gorithm.  Estimate  the  residual  error  a  for  all  ‘‘n”  points  given  the  camera  parameters  and  find  the 
median  square  error. 

(c)  Select  the  camera  parameters  which  gives  the  minimum  median  error  and  compute  the  scale  “s” 
using  equation: 


s  =  mediauj 


(12) 
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(d)  Filter  out  points  as  <atliers  whose  squared  residual  error  from  the  camera  parameters  is  greater  than 
(as)^;  a  is  an  algorithm  parameter  and  is  set  equal  to  1.5  for  all  tests. 

(e)  Minimize  the  error  function  given  in  equation  11  on  the  remaining  points  using  the  above  algorithm 
and  return  the  estimated  camera  parameters  as  the  final  output. 

The  calibration  procedure  is  as  follows.  We  take  three  images  of  a  fixed  calibration  pattern  with  different 
depth  distances  from  the  camera.  Then  we  extract  a  set  of  points  in  each  image,  and  measure  the  3D 
coordinates  of  those  points  relative  to  some  arbitrary  3D  coordinate  system.  Finally,  the  2D  image  points 
and  3D  world  points  are  input  into  this  algorithm  to  compute  the  internal  parameters.  In  this  case  (one  of 
the  images  is  shown  in  Figure  1),  the  (0,0,0)  point  is  the  lower  corner  of  the  intersection  between  the  two 
boards,  and  the  known  quantities  are  the  radius  of  the  circles  and  the  spacing  between  the  circles  (in  3D). 
The  internal  parameters  estimated  for  the  Fort  Carson  image  data  using  this  method  are  recorded  in  Table 
1. 


Figure  1:  Geometric  Camera  Calibration  Target  for  Fort  Carson  Data. 

To  have  a  comparison  between  the  performance  of  this  algorithm  and  those  of  the  others,  we  use  the 
same  data  points  for  Ganapathy’s  [Gan84]  and  Crowley  et  al’s  [CB93]  algorithm.  The  following  table  lists 
the  total  error  projected  back  to  the  image  plane  for  each  algorithm,  respectively.  The  total  error  is  the  sum 
over  all  the  points  used  in  the  calibration  procedure  (n  =  18),  of  the  error  in  each  point. 


Total  Errors  for  the  Same  Point  Set 
unit:  pixel 


Ours 

Ganapathy 

Crowley 

0.620 

1.230 

6.399 

3.3  Intrinsic  Parameters  from  Field  of  View  and  Image  Size 

Calibration  as  described  above  is  clearly  superior  to  simply  estimating  intrinsic  parameters  based  upon 
general  information  provided  by  manufacturers.  However,  there  are  times  when  calibration  is  impractical 
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and  one  must  make  an  intelligent  guess  based  upon  commonly  provided  information  such  as  the  vertical  and 
horizontal  field  of  view  alone. 

The  mapping  is  rather  straight  forward,  but  for  the  sake  of  completeness  it  is  presented  here.  Table  2 
presents  the  conclusions  drawn  from  this  section  relating  field  of  view,  image  size  and  intrinsic  parameters 
for  each  sensor  used  in  the  Fort  Carson  data  collection.  Table  2  lists  the  fields  of  view,  pixel  dimensions 
of  the  images,  and  the  intrinsic  parameters  defined  earlier.  For  the  color  and  IR  sensors,  two  entries  are 
listed.  The  first  entry  for  color  lists  the  parameters  derived  from  the  calibration  procedure  described  in  the 
previous  section.  The  second  are  parameters  derived  from  the  specifications  of  the  color  sensor.  Specifically, 
the  knowledge  that  the  film  is  36mm  wide,  12mm  high,  and  a  50mm  lens  was  used. 


Sensor 

FOV/„ 

FOV/„ 

Dimensions 

Scale 

Center 

Rad. 

Deg. 

Rad. 

Deg. 

du 

dy 

Sy 

i'll 

ty 

Color  (from  Cal.) 

0.705 

40.4 

0.496 

28.4 

720 

480 

-978 

947 

Color  (from  Specs.) 

0.691 

39.6 

0.471 

27.0 

720 

480 

-1000 

1000 

359.5 

239.5 

FLIR  (Visual  Cal.) 

0.434 

EES3 

23.0 

127.5 

FLIR  (from  Specs) 

0.419 

HI 

liSm 

24 

HiKi 

127.5 

range 

0.271 

■Mlil 

3.4 

59.5 

Table  2:  Intrinsic  Sensor  Parameters  for  Fort  Carson  Color,  IR  and  Rang. 


The  first  of  the  two  entries  for  the  FLIR  are  based  upon  the  manufacturers  specified  field  of  view  for  the 
FLIR.  The  second  incorporates  a  correction  generated  by  hand  based  upon  visual  appearance  of  modeled 
3D  objects  in  both  range  and  IR.  This  was  done  interactively  using  our  own  multi-sensor  visualization 
software  [GBSF95,  GBSF94].  With  this  software,  it  is  possible  to  first  align  a  3D  object  model  with  range 
data  and  also  with  IR  using  the  manufacturers  specifications  for  the  IR  sensor.  Then  a  user  can  adjust  the 
scale  factors  so  as  to  make  the  projection  of  the  object  model  more  precisely  match  the  appearance  of  the 
object  in  IR.  This  process  is  certainly  not  assured  of  generating  the  true  intrinsic  parameters,  but  it  will 
generate  compatible  range  and  IR  parameters  for  objects  at  similar  depth. 

The  parameters  for  the  LADAR  range  sensor  are  based  upon  field  calibration  using  calibrated  imagery. 
To  calibrate  the  horizontal  angular  pixel  resolution,  the  sensor  was  field  tested  viewing  a  pair  of  survey 
markers  50  feet  apart  at  184  feet  from  the  sensor.  To  calibrate  the  vertical  angular  resolution,  the  senosr 
viewed  two  vertical  markers  3.7  feet  apart  at  157  feet.  The  maximum  range  measured  by  the  LADAR  is 
1074  feet,  and  hence  multiplying  a  raw  pixel  value  by  the  ratio  1074/4095  yields  a  range  measurement  in 
feet.  The  a  on  the  range  meaurement  is  approximately  1  foot  [Bel93].  More  will  be  said  about  the  geometry 
of  the  LADAR  in  the  following  section. 

In  going  back  and  forth  between  alternative  ways  of  describing  a  sensor,  a  minor  point  where  confusion 
can  arise  concerns  the  exact  ’position’  of  a  pixel.  The  convention  used  here  is  that  the  pixel  centers  are 
points  on  the  17,  V  image  plane  with  integer  coordinates.  When  drawing  an  image  as  a  grid,  this  convention 
means  the  pixels  centers  fall  at  intersections  of  grid  lines.  This  relationship  is  illustrated  in  Figure  2. 

When  going  from  a  specification  of  the  field  of  view  and  image  dimensions,  an  assumption  about  the 
optical  center  must  be  made.  Considering  an  image  with  dimensions  d^),  the  obvious  default  assumption 
is  to  place  the  optical  center  at  the  center  of  the  image. 


Oy. 

d^-1 

2 

c  = 

Cy 

= 

dv-1 

2 

1 

1 

The  terms  tu  and  ty  in  the  projection  matrix  defined  in  equation  1  are  the  coordinates  of  the  image  center 
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and  therefore  two  of  the  four  intrinsic  parameters  are  now  known. 


iu 

Cu 

ty 

Cy 

1 

1 

(14) 


So  long  as  the  optical  center  and  image  centers,  the  scale  factors  are  also  easily  computed.  The  key 
observation  is  that  half  the  pixels  span  have  the  field  of  view  angle,  and  therefore: 

fu  =  2arctan^“^  fv  =  2arctan^^j  (15) 

Inverting  this  relationship  yields  expressions  for  the  scale  factors  in  terms  of  the  dimensions  and  field  of  view. 


s 


u 


Sy 


(16) 


This  is  the  clearest  and  simplest  way  to  solve  for  the  scale  factors.  However,  it  does  not  completely  address 
the  relationships  between  points  in  3D  and  their  mappings  to  points  in  the  UV  image  plane.  There  is  an 
alternative  derivation  which,  while  more  cumbersome,  also  adds  some  additional  insight  into  the  relationships 
involved. 

Consider  the  mapping  between  the  corners  of  the  image  and  the  corresponding  3D  points  which  project 
to  the  corners.  The  field  of  view^  determines  the  minimum  and  maximum  X  and  Y  values  visible  to  the 
sensor  at  a  given  depth  Z. 

Xrnin  —  ^  Yfnin  ““  ^  (17) 

Xmax  —  ^  ^max  —  ^ 

The  choice  of  Z  in  equations  17  will  not  matter  when  projecting  points  to  the  image  plane.  For  the  sake  of 
simplicity,  points  on  the  Z  =  1  plane  may  be  selected  as  the  3D  corners. 

The  corners  in  the  UV  image  plane  may  also  be  expressed  in  terms  of  upper  and  lower  bounds. 


'^min 

'^max 


'^min 

'^max 


(18) 
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To  determine  the  scale  factors  it  is  sufficient  to  map  one  corner  on  the  Z  =  I  plane  to  its  corresponding 
corner  on  the  UV  image  plane.  We  will  select  the  bottom  left  corner  (umin^'^min)  yielding  the  constraint: 


Umin 

Su 

0 

tu 

0 

'^min 

0 

Sy 

ty 

0 

1 

0 

0 

1 

0 

<■  min 

1 

,  1 


(19) 


Note  that  the  lower  bound  on  U  is  being  mapped  to  the  upper  bound  on  X.  This  flips  the  sign  of  the  U 
axis  relative  to  the  X  axis.  The  reversal  is  made  so  that  objects  may  be  viewed  forward  of  the  camera,  i.e. 
with  positive  Z  values  while  at  the  same  time  allowing  the  image  plane  to  have  a  natural  origin  at  the  lower 
left  corner.  This  reversal  of  direction  is  illustrated  in  Figure  2. 

Solving  equation  19  for  the  scale  factor  using  values  from  equations  17  and  18  yields 


= 


tan(-^) 


(20) 


These  expression  simplify  to  those  already  presented  above  when  it  is  noted  that  when  the  optical  centers 
and  image  centers  are  coincident 


-tu 


dy, 

T 


'^min 


tu  = 


dy 

T 


(21) 


Thus  leading  finally  to 


Su  —  f 

tan  2 

Equations  16  and  22  are  identical  up  to  the  change  in  sign  on  $u 
X  axis. 


2 


tan  ^ 


(22) 


which  reverses  the  U  axes  relative  to  the 


4  LADAR  Range  Sensor  Geometry 

The  imaging  geometry  for  the  LADAR  range  sensor  is  similar,  but  not  identical  to  that  of  a  perspective 
camera.  The  key  distinction  is  that  the  LADAR  has  a  constant  sampling  angle  per  pixel.  This  is  not  true 
of  a  projective  sensor,  in  which  pixels  subtend  increasingly  large  angles  as  they  move  away  from  the  image 
center. 

The  radial  mapping  between  a  pixel  (ti,t;)  in  a  LADAR  range  image  and  the  point  in  the  world  sampled 
by  that  pixel  depends  upon  the  angles  (0,</>)  encoded  by  the  pixel  coordinate.  There  is  a  2D  scale  and 
translation  transformation  between  the  angles  (O^cf))  and  pixel  coordinates  {u,v)  which  moves  the  image 
center  and  changes  the  units  from  radians  to  pixels. 


u 

T  u  0  tu 

e 

Tyd  ty 

V 

1 

0  Ty  ty 

4> 

= 

TyCj)  ”h  ty 

1 

0  1 

1 

1 

The  mapping  from  3D  Cartesian  coordinates  to  the  image  plane  involves  both  this  2D  affine  transformation 
as  well  as  a  spherical  to  Cartesian  coordinate  transformation.  The  angles  {6,  (f))  are  the  azimuth  and  elevation 
of  a  ray  projecting  out  from  the  LADAR  focal  point  to  a  point  in  the  scene  a  distance  D  from  the  focal 
point.  As  such,  (0,0)  represent  rotation  about  the  vertical  and  horizontal  axes  respectively. 

To  derive  the  spherical  to  Cartesian  transformations  it  helps  to  view  (0, 0)  as  representing  rotation  about 
the  vertical  and  horizontal  axes  respectively.  Consider  a  new  measurement  specific  coordinate  system  V  in 
which  the  Cartesian  coordinate  of  the  point  being  viewed  is  defined  as  the  point  D  units  out  the  Z  axis. 


The  coordinate  transformation  we  seek  maps  V  into  the  canonical  LADAR  system  L.  To  accomplish 
this  transformation,  a  point  must  first  be  rotated  about  the  X  axis  by  an  amount  cj)  representing  the 
devation  of  the  point  above  the  horizontal.  Next  the  point  must  be  rotated  about  the  Y  axis  by  an  amount 
9  representing  the  panning  of  the  sensor  in  the  XY  plane.  Composing  these  two  rotation  matrices  in  the 
proper  order  defines  the  relationship  between  the  two  coordinate  systems. 


X 

COS  6 

0 

sin0 

1 

0 

0 

0 

COS  (f)  sin  9D 

Y 

0 

1 

0 

0 

COS(^ 

sin</> 

0 

= 

sin 

Z 

—  sin  9 

0 

COS0 

0 

—  sin<l) 

cos<j> 

D 

cos  (j)  cos  9D 

Tke  inverse  mapping  from  Cartesian  coordinates  X,  Y,  Z  to  pixel  coordinate  and  depth  D  is  expressed 
hf  the  following  equations: 


D  = 

,  X 

6  =  arctan  — 

Z 

(j)  =  arcsm  — 


(26) 

(27) 

(28) 


Combining  these  equations  with  the  angular  scale  factors  in  equation  23  yields  the  expression  for  the  UV 
LADAR  coordinates  of  a  3D  point. 


u 


V 


Tu 


Tv 


arctan 

arcsin 


( ,  ^ 


) 


H"  tv 


(29) 

(30) 


4-1  Parameters  for  the  Fort  Carson  LADAR 

The  LADAR  used  to  collect  the  Fort  Carson  data  was  calibrated  before  the  data  collection  by  collecting 
iriaagery  of  surveyed  features  and  measuring  the  angular  resolution.  This  yields  measured  r  values: 

=  -«««(^)  =  383.1  (^)  (31) 

As  already  reported,  the  LADAR  images  contain  120x24  pixels.  The  field  of  view  values  reported  in 
Table  2  are  derived  from  the  images  size  and  scale  factors  and  as  follows: 


120-1 

438.6 

=  0.271  Radians  (15.5  Degrees) 


383.1 

=  0.060  Radians  (3.4  Degrees) 


(32) 

(33) 


In  the  absence  of  precise  calibration  it  is  assumed  that  the  image  center  for  the  LADAR  is  just  the  center 
of  the  bounded  image  plane  and  hence  the  translation  terms  tu  and  using  the  pixel  center  method  of 
Figure  2,  are  as  follows: 


tu  =  59.5  Pixels  (34) 

tu  =  11.5  Pixels  (35) 
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4.2  Spherical  Versus  Perspective  Mappings  and  Image  Displacement 

While  the  Spherical  Mapping  defined  in  equation  25  for  the  range  sensor  is  a  better  model  of  what  actually 
takes  place  in  LADAR,  it  is  fair  to  ask  whether  over  typical  fields  of  view  and  sampling  resolutions  the 
difference  is  significant  relative  to  a  perspective  mapping. 

One  way  to  answer  this  question  is  to  plot  the  Euclidean  offset  between  the  UV  mapping  from  spherical 
projection  versus  the  UV  mapping  from  perspective  projection  for,  the  same  3D  points  over  the  sensor  field 
view.  Using  the  perspective  matrix  from  equation  1,  the  mapping  for  perspective  projection  is 


Up  —  Sii  (  ^ )  “h  '^p  — 

=  -435.9  (f )  +  59.5  =  383.0  (f )  +  11.5 


(36) 


Substituting  the  intrinsic  spherical  parameters  in  equations  31,  34  and  35  into  equations  29  and  30  yields 
the  following  spherical  mapping  from  3D  points  to  the  UV  LADAR  image  plane. 

Us  =  ruarctan(f ) +tu  Vs  =  arcsin  ^ 

=  -438.6(f) +59.5  =  383.1  11.5 

It  is  now  possible  to  ask  how  these  two  mappings  differ  over  the  sensor  field  of  view.  For  points  at 
Z  —  100,  the  visible  points  have  X  values  ranging  from  —13.65  to  13.65  and  Y  values  ranging  from  —3.0 
to  3.0.  By  plotting  the  Euclidean  distance  between  the  two  image  mappings  for  corresponding  3D  points 
one  can  see  the  extent  to  which  the  two  mapping  differ  over  different  portions  of  the  image.  The  Euclidean 
difference  A  may  be  written  as: 


A(X,y)  =  ^{Up-Usf  +  {vp-Vsf  (38) 

This  difference  measured  over  all  visible  points  is  shown  as  a  surface  plot  in  Figure  3.  Observe  the  difference 
never  exceeds  0.15  over  the  entire  image.  Hence,  while  the  spherical  mapping  is  more  correct  given  the 
sensor  design,  the  difference  between  the  two  with  respect  to  relative  pixel  mappings  in  the  image  planes 
never  varies  more  than  by  15%  of  the  width  of  a  single  pixel. 

4.3  Spherical  Versus  Perspective  Mappings  and  Range  Displacement 

The  previous  section  showed  that  relative  to  the  displacement  between  pixel  coordinates,  the  difference 
between  spherical  and  perspective  projection  is  not  significant  for  the  specific  case  of  the  Fort  Carson  LADAR. 

This  section  takes  up  a  different  question:  what  if  range  data  is  back-projected  into  the  scene  assuming 
perspective  when  it  has  in  fact  been  collected  using  spherical  projection.  To  test  this  case,  the  same  field  of 
planar  points  at  Z  =  100  will  be  considered.  This  time,  these  will  be  imaged  using  the  spherical  projection, 
and  then  back-projected  into  the  scene  using  perspective. 

The  mapping  for  points  in  the  world  to  pixels  using  spherical  projection  was  already  expressed  in  equa¬ 
tion  37.  Let  us  introduce  the  convention  of  writing  down  a  range  pixel  I  as  a  coordinate- value  pair: 


/ 

u 

(\ 

Vu  arctan  (^)  -\-tu 

\ 

V 

1 

[. 

r,  arcsin  +4 

1 

,  VX2+y2  +  ^2 

/ 

where  u  and  v  are  the  image  coordinates  of  a  point  in  3D  and  D  is  the  recorded  depth  to  that  point. 

The  projected  point  can  then  be  back-projected  into  the  scene  using  the  perspective  rather  than  spherical 
mapping.  Figure  4  shows  the  error  introduced  by  mixing  the  projection  schemes.  The  graph  has  been 
generated  for  pixels  at  100m.  Note  the  error  is  quite  small  relative  to  the  large  change  in  Z. 
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0.25 


Figure  3:  Pixel  Mapping  A  over  viewable  points  (X,F).  A  is  Euclidean  distance  between  spherical  and 
perspective  projections.  The  difference  is  least  alone  the  line  F  =  0  and  never  exceeds  0.15. 

5  Deviations  from  Perfect  Boresight  Alignment 

Consider  transformations  between  three  distinct  3D  coordinate  reference  frames:  a  world  reference  W  and 
two  sensor  reference  frames  A  and  B.  Define  the  3D  transformation  from  frame  W  to  frame  A  as: 

where  .SwA  Is  a  scale  transformation,  RwA  is  a  rotation  and  Twa  is  a  translation.  Hence,  the  standard 
mapping  of  points  from  one  frame  of  reference  to  another  is  accomplished  by  pre-multiplying  the  point  by 
the  transformation. 


A  =  MwaW  (41) 

where  IF  is  a  point  in  reference  frame  W  and  A  is  the  same  point  expressed  in  the  range  sensor  reference 
frame  R.  An  analogous  transformation  Myy;B  maps  points  from  the  world  to  reference  frame  B. 

These  transformations  are  often  described  as  the  extrinsic  sensor  parameters.  In  the  case  where  Mwa  = 
Mwb,  the  sensors  have  identical  references  frames  and  pixel  mappings  will  differ  only  as  a  function  of  the 
sensor  parameters.  Equality  of  extrinsic  parameters  between  two  sensors  may  be  thought  of  as  the  condition 
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Figure  4:  Reconstructed  Z'  for  viewable  points  {X,  y,  100)  when  Perspective  and  Spherical  Projection  are 
mixed.  Here,  the  X-axis  (on  the  right)  and  P-axis  (on  the  left)  are  in  meters  at  the  target.  The  Z-axis  is 
Z'  in  meters,  with  the  plane  at  100  meters  representing  the  true  depth,  Z.  Note  characteristic  shape  similar 
to  pixel  displacement,  but  tiny  absolute  deviation  from  true  100  meters.  Greatest  deviation  is  roughly  2 
millimeters  in  Z  over  25  meters  in  X. 


which  arises  when  two  sensors  are  perfectly  bore-sight  aligned.  In  other  words,  they  have  coincident  focal 
points  and  coincident  optical  axes. 

How  minor  deviations  from  perfect  bore-sight  alignment  alter  the  relative  mapping  between  sensor  pixels 
is  of  key  interest  when  doing  sensor  fusion.  The  remainder  of  this  section  will  consider  how  3D  points  map 
to  the  respective  pixel  coordinates  of  sensors  A  and  B 

5.1  The  Base-Case:  Perfect  Bore-Sight  Alignment 

In  the  case  of  perfect  bore-sight  alignment,  the  transformations  My\;A  =  sire  equal  and  hence  may 

be  neglected.  Therefore,  the  projection  matrixes  defined  above  operating  on  the  same  3D  point  allow  us  to 
derive  points  A  and  B  in  the  respective  image  coordinates  of  the  two  sensors  A  and  B. 
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X 

S0U 

0 

X 
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The  prime  notation  A'  emphasizes  that  this  linear  algebraic  expression  produces  a  point  equivalent  up  to  a 
scale  factor  and  that  the  true  Ua  and  Va  coordinates  are  indicated  in  the  normalized  form  A  where  the  third 
element  is  1. 

Carrying  out  the  indicated  linear  multiplication  yields: 


Spii^  ^pu^ 

A'  = 

H”  i(xv^ 
Z 

B'  = 

z 

(43) 


and  after  normalization,  the  following  are  the  coordinates  of  a  point  (X,  Y,  Z)  in  the  UVa  and  the  UV^ 
planes. 

Uci  =  ^au  (x)  ~b  ^f3u  (x)  "b  (44) 

Vqi  —  Sav  (“z)  *b  ^av  '^{3  ^0v  ("2)  "b  ^(3v 

Our  goal  is  a  linear  mapping  from  UVa  to  UV^.  This  can  be  readily  computed  by  equating  the  common  ratio 
terms  4  and  ^  in  the  above  equation  and  solving  for  the  UVp  coordinates  in  terms  of  the  UVa  coordinates. 
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5.2  Rotation  About  the  Optical  Axis 

Consider  now  the  above  case  with  minor  modification:  sensor  B  is  rotated  about  the  Z,  the  optical  axis,  by 
an  amount  </>.  The  transformation  and  projection  equation  are: 
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In  a  manner  similar  to  that  shown  above,  one  can  equate  the  ratios  ^  ^  solve  for  a  linear  mapping 

from  UVa  to  C/V/?. 
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In  the  special  case  that  the  horizontal  and  vertical  scale  factors  for  sensor  A  are  equal,  as  are  the  scale 
factors  for  sensor  B,  then  and  only  then  does  this  reduce  to  a  2D  rotation  plus  a  translation.  To  see  this,  let 


Sa  -  Sau  -  Sav  k  =  ^ 

Sj3  —  ^Pu  —  ^Pv 

and  note  that  equation  47  simplifies  to 


Up 

cos  (j)  sin  ^  0 

Ua 

tpu  -  ^  cos  (j)tau  ^  sin  (l)tav 

Vp 

=  k 

—  sin</)  cos(j)  0 

Va 

+ 

tpv  +  sin  (j)tau  ~  7^  COS  (ptav 

^  Sau  Sav 

1 

0  0  1 

1 

1 

(48) 


(49) 


When  the  scale  factors  are  unequal,  then  the  mapping  between  UVa  and  UVp  is  a  general  2D  affine 
transformation. 

B  =  M  A  (50) 
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where 


f  cos  (l>tau 

(51) 

0 

0 

1 

The  basic  conclusion  to  be  drawn  from  this  section  is  that  there  is  still  an  exact  2D  affine  mapping 
between  UVa  and  UVp  and  subject  to  equal  horizontal  and  vertical  scaling  the  rotation  angle  maps  directly 
into  2D.  However,  in  the  more  general  case,  the  full  6  degrees  of  freedom  associated  with  the  2D  affine 
transform  are  needed  to  represent  warping  induced  by  rotation  about  unequally  sampled  axes. 


5.3  Translation  Along  the  Z  axis 


Consider  again  a  perfectly  bore-sight  aligned  pair  of  sensors  and  now  ask  what  happens  if  sensor  B  is 
translates  ahead  or  behind  sensor  A  along  the  common  optical  axis.  Under  these  conditions,  the  projection 
equations  for  a  common  point  are: 


0 

tau  0 

X 
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Clearly  the  coordinates  on  the  UVa  plane  are  the  same  a  derived  in  equation  44.  The  coordinates  of  the 
same  point  in  the  t/U/3  plane  are 


Up 

vp 


^ pu^  *4"  ipu^  4” 

tTTz 

4“  tpvZ  +  tpyTz 

tTTz 


(53) 

(54) 


The  denominator  Tz  +  Z  complicates  the  task  of  comparing  coordinate  systems.  However,  X  and  Y  can 
be  expressed  in  terms  of  the  other  variables  for  each  transformation  and  then  these  common  expressions 
themselves  may  be  equated.  Thus,  coordinates  in  UVp  may  be  expressed  in  terms  of  coordinates  in  UVa  so 
long  as  the  depth  values  Z  is  assumed  to  be  known. 


B  =:  MA 


(55) 


where 


M  = 


^  (X*+^)5a 

0 

0 

0 


0 

(T.+Z)s„u) 

(  5/3.  ^ 

\^(3v  ~ 

{Tz  +  Z)Sotv  ) 

0 

0 


1 

0 


(56) 


The  depth  of  a  point  Z  changes  the  scaling  applied  to  the  point  in  mapping  between  UVa  and  UVp.  Under 
the  highly  restricted  case  of  viewing  points  all  lying  in  a  plane  of  constant  Z,  the  scaling  is  constant  for 
all  points  and  the  matrix  M  reduces  to  a  simple  2D  affine  transformation.  However,  in  general  no  single 
2D  affine  transformation  can  capture  the  UVa  to  UVp  mapping  if  sensor  B  is  translated  ahead  of  or  behind 
sensor  A, 


14 


5.4  Translation  in  a  Common  Image  Plane 


Again,  starting  with  the  perfectly  bore-sight  aligned  configuration,  consider  what  happens  when  sensor  B 
translates  in  a  common  XY  image  plane.  The  projection  equations  for  this  case  are: 
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The  coordinates  of  the  projection  of  the  point  (X,  Y  ,  Z)  in  the  UVp  plane  are 


(57) 


— 

V0  = 


{X  -f-  Tx)  ,  ^ 

~  I-  tpu 

spy  {y  +  Ty) 

^  r 


(58) 

(59) 


Taking  the  same  approach  as  in  the  previous  section,  the  coordinates  of  a  common  point  projected  to  the 
UVp  plane  expressed  in  terms  of  its  coordinates  in  the  UVa  plane  can  be  shown  to  depend  upon  the  depth 
of  the  point  Z. 

B  =  M  A  (60) 


where 


M  = 


S0ii 

Sau 


0 


Q 

Sav 


QyTy  SpytcxV 

Z  Sav 


(61) 


Unlike  the  translation  in  Z  case,  the  dependency  on  depth  in  this  case  modifies  the  translation  of  one  plane 
relative  to  the  other.  In  simple  and  intuitive  terms,  this  means  that  for  points  all  at  a  common  depth  Z 
there  is  a  2D  afiftne  mapping  between  UVa  and  UVp, 

In  general,  the  2D  scale  change  between  systems  is  identical  to  that  for  perfect  bore-sight  aligned  sensors. 
The  relative  2D  translation  is  similar  to  the  bore-sighted  case  but  with  one  additional  term  which  depends 
upon  the  depth  of  the  points  Z  and  the  planar  translation  Tx,Ty  between  the  two  sensors.  As  should  come 
as  no  surprise,  the  apparent  2D  translation  between  UVa  and  UVp  gets  larger  for  points  near  the  sensors 
and  less  for  points  far  from  the  sensor. 


5.5  Rotation  About  the  Horizontal  and  Vertical  Axes 


The  case  of  rotation  about  the  X  and  Y  axes  is  considerably  more  complicated  than  those  previously 
considered.  It  is  no  longer  practical  to  solve  a  direct  mapping  between  the  UVa  s^nd  UVp  image  planes.  The 
mapping  is  no  longer  well  expressed  as  a  2D  affine  transformation,  even  allowing  for  simple  parameterization 
in  say  the  depth  value  Z.  The  relationship  between  image  spaces  is  coupled  through  the  angles  of  rotation 
and  dependent  upon  all  three  point  coordinates:  (X,  T,  Z). 

It  is  relatively  simple,  though,  to  express  the  mapping  from  points  in  the  world  to  the  UVp  image  plane. 
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The  coordinates  of  the  projection  of  the  point  (X,  Y, 
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Z)  may  now  be 
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written  as 
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(64) 


where 
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6  Approximating  Small  Pan/Tilt  Errors  with  Translation 


Much  of  the  previous  development  has  been  working  up  to  answer  the  following  question.  Given  two  sensors, 
one  range  and  one  optical,  which  are  nearly  bore-sight  aligned,  how  much  error  is  introduced  if  small  rotations 
about  the  X  and  Y  axes  are  modeled  as  translation  in  a  common  image  plane? 

To  address  this  question,  first  there  is  the  matter  of  whether  to  model  the  range  sensor  as  a  spherical  or 
a  perspective  projection  sensor.  While  we  know  spherical  is  more  consistent  with  the  actual  construction  of 
the  sensor,  Section  4  demonstrated  that  little  error  is  introduced  for  a  sensor  with  parameters  such  as  were 
used  in  the  Fort  Carson  data  collection.  The  greatest  error  in  relative  pixel  mappings  over  the  sensor  field  of 
view  was  0.15  pixels.  Therefore,  because  it  will  make  our  task  slightly  simpler,  both  the  range  and  optical 
sensors  will  be  assumed  to  be  projective. 

The  equations  relating  a  general  point  (X,  Y,  Z)  to  the  image  plane  of  a  sensor  assuming  translation 
versus  rotation  have  already  been  presented  in  equations  58,  59,  63  and  64.  Here  let  us  restate  these  equation 
with  the  following  change.  Let  sensor  A  rotate  about  X  and  Y  and  let  sensor  B  translate  in  the  common 
XY  plane. 


Ua  - 

Sau  (cos (j)yX  4-  (cos 4>xZ  -  siii 

(i>xY)  sin  (py)  ,  ^ 

(71) 

(cos  4)xZ  —  sin  (j)xy)  cos  <j)y  - 

-sin^yX  ^ 

Va  - 

Sav(C'OS(j)xY  +  sin  (f>xZ) 

(cos  <j>xZ  —  sm  (l>xy)  cos  (i)y  —  s\n(j)yX 

(72) 

spu  (X  +  Tx)  ,  ^ 

^  +  *Pu 

(73) 

vp  = 

S0v(Y  +  Ty)  ,  , 

^  +  tpu 

(74) 

There  are  two  steps  involved  in  analyzing  the  degree  to  which  the  translation  mapping  can  stand  in  for 
the  rotation  mapping.  The  first  addresses  the  question  of  sensor  parameters.  The  comparison  is  most  direct 
if  identical  sensor  parameters  are  assumed.  For  the  comparison,  we  will  use  the  following  intrinsic  parameters 


^Ct.U  — 

Sdv  — 

—  Su 


tpv  =  0 


(75) 


the  image  center  translation  terms,  once  equal,  will  drop  out  of  any  comparison,  and  therefore  might  as  well 
be  set  to  zero. 
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The  next  matter  is  how  to  determine  what  degree  of  translation  Tx,  Ty  to  introduce  in  order  to  approx¬ 
imate  a  rotation  (f)x,  <j>y‘  The  goal  shall  be  to  track  a  common  pdnt  at  depth  D,  More  precisely,  consider 
the  point  (0,  0,  D)  in  the  coordinate  reference  frame  for  sensor  A.  Determine  a  translation  for  sensor  B 
such  that  this  point  projects  to  the  center  of  the  UV^  image  plane. 

To  accomplish  this  transformation,  observe  that  the  coordinates  in  the  world  reference  frame  W  of  the 
point  to  be  tracked  P’  may  be  expressed  in  terms  of  the  rotation  of  sensor  A. 
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The  world  coordinates  of  the  point  F'  are  therefore 
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-  sin  (l>yD 

Y' 

—  sin  cos  <pyD 

Z' 

cos  (px  OOS  (f)yjy 

1 

1 

(77) 


Setting  equations  73  and  74  equal  to  zero  for  the  point  to  be  tracked  provides  an  expression  for  the 
translations  necessary  to  keep  the  point  at  depth  D  in  the  center  of  view  of  the  translating  as  well  as  the 
rotating  sensor. 

Tx  —  s\n(j)yD  Ty  =  cos^ysm  (l>xD  (78) 

Substituting  these  back  into  equations  73  and  74  yields  a  general  mapping  from  world  reference  frame  W  to 
the  UVp  image  plane  subject  to  the  constraint  that  the  translating  sensor  track  the  point  at  depth  D  viewed 
by  the  rotating  sensor  A. 


Up 


Su  {X  -f  sin  (l)yD) 

Z 

Sy  {Y  -f  cos  (l)y  sin  (f>xD) 
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(79) 

(80) 

(81) 


Recall  also  the  simplifications  from  equation  75  have  been  employed. 

It  is  now  possible  to  define  a  simple  expression  which  is  a  function  of  a  points  placement  in  the  world, 
{X,  y,  Z)  and  sensor  ^’s  rotation  {(l>Xi<l>y)  which  represents  the  Euclidean  distance  between  projections  of 
equivalent  3D  points  on  the  two  UVa  and  UVp  image  planes. 


A 


-  Up)'^  +  (-Uq  —  Upf 


(82) 

(83) 


Using  equation  82  it  is  possible  to  examine  a  variety  of  scenarios  and  determine  how  much  pixel  error 
between  corresponding  3D  points  is  introduced  when  XY  sensor  translation  is  used  to  approximate  XY 
sensor  rotation. 


6.1  Scenario  1:  Points  at  Constant  Dept 

Consider  a  plane  of  points  at  depth  Z  =  100  meters.  The  tracking  point  will  therefore  be  assumed  to  lie 
at  depth  D  =  100  as  well.  Set  the  azimuth  (j)y  and  the  elevation  to  (j)x  specific  values  and  plot  A  as  a 
function  of  independent  variables  X  and  Y,  The  plots  in  Figures  5a  and  5b  shows  this  plot  for  the  color 
sensor  parameters:  row  2  of  Table  2.  The  difference  between  the  two  is  Figure  5a  uses  smaller  rotations  than 
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Figure  5b.  Figure  5c  and  5d  show  A  for  the  FLIR  (row  4  of  Table  2)  and  range  sensor.  The  reduced  fields 
of  view  and  pixel  resolution  lead  to  much  smaller  absolute  A  values  for  these  sensors. 

To  really  understand  the  plots  in  Figure  5  it  is  important  to  understand  the  choice  of  bounds  for  X  and 
Y  both  in  this  and  other  examples.  The  idea  is  to  select  bounds  which  match  that  portion  of  3D  space 
visible  to  the  sensor.  To  accomplish  this,  the  bounds  must  depend  upon  two  things.  First  they  depend  upon 
the  width  W  and  height  H  of  the  area  in  view  at  depth  Z.  They  also  depend  upon  the  translation  used  to 
track  the  center  point  being  viewed  by  both  sensors:  {Tx,  Ty). 

W  =  2tan^Z 

H  =  2tan4z 
2 

Hence  the  bounds  on  X  and  Y  are: 

Xma.  =  Tx  +  Wl2  Xmin  =  Tx-W/2  .  . 

Fmax  =  Ty  +  H/2  Ymin  =  Ty-H/2  ^  ^ 

6.2  Scenario  2:  Limited  Deviation  from  Target  Tracking  Depth 

The  next  question  to  consider  using  A  from  equation  82  is  how  the  rotation  versus  translation  approximation 
holds  up  over  a  limited  range  of  depth  values  centered  about  the  tracking  depth.  To  give  this  some  practical 
motivation,  assume  an  object  2  meters  wide  is  viewed  by  sensors  A  and  B  with  A  rotated  about  the  vertical 
axis  by  an  amount  (j)y  =  1/5  degree. 

Figure  7  shows  A  as  a  dependent  variable  of  X  and  Z.  For  the  intrinsic  camera  parameters,  the  values  for 
the  color  sensor  in  Table  2),  row  2,  are  used.  The  other  variables  in  equation  82  are  constrained  as  follows. 
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The  term  k  allows  movement  of  the  point  being  viewed  vertically  with  k  =  0.5  specifying  the  center  of  the 
image. 


6.3  Scenario  3:  Wide  Variation  in  Depth 

It  is  clear  that  translation  very  well  approximates  rotation  for  a  depth  field  about  the  tracking  depth  D. 
Another  question  is  just  how  significantly  does  the  approximation  of  Z  values  differ  from  D. 

Figure  7  shows  A  as  a  dependent  function  of  rotation  angle  <f)y  and  point  depth  Z.  For  the  intrinsic 
camera  parameters,  the  values  for  the  color  sensor  in  Table  2),  row  2,  are  again  used.  The  other  variables  in 
equation  82  are  constrained  as  follows. 
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Figure  6:  Pixel  Differences  A  over  small  variations  in  Depth,  a)  Viewing  the  image  center  {k  =  0.5),  b) 
Viewing  a  point  60%  up  the  image  {k  =  0.6).  Vertical  axis  A,  left  bottom  axis  Z  and  the  right  bottom 
axis  X. 


Figure  7:  Pixel  Differences  A  over  large  variations  in  Depth  for  a  center  pixel  as  a  function  of  Z  and  (j)y.  a) 
surface  plot  with  A  vertical,  Z  facing  and  (py  back  into  page,  b)  contour  plot  of  same  data  with  (py  vertical 
and  Z  horizontal. 


Thus,  Figure  7  show  essentially  the  deviation  between  the  center  pixel  for  sensor  B  relative  to  sensor  A  as 
true  depth  to  the  point  varies.  Figure  7a  show  presents  the  data  as  a  surface  plot  and  Figure  7b  as  a  contour 
plot. 

While  the  surface  plot  is  more  suggestive  of  shape,  the  contour  plot  may  be  thought  of  as  delineating 
pairs  of  values  (Z,  (j)y)  such  that  the  maximum  pixel  error  A  does  not  exceed  a  threshold. 

Figure  7b  shows  the  accuracy  modelling  small  errors  in  rotation  with  planar  translations.  Using  the 
assumption  the  sensors  are  near-boresight  aligned,  the  error  introduced  while  using  the  translation  mapping 
are  quite  small.  As  is  to  be  expected,  as  the  near  boresight  constraint  is  relaxed,  increasing  amounts  of  error 
are  introduced. 
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